SCGPred: A Score-based Method for Gene Structure Prediction by Combining Multiple Sources of Evidence

نویسندگان

  • Xiao Li
  • Qing'an Ren
  • Yang Weng
  • Haoyang Cai
  • Yunmin Zhu
  • Yi-Zheng Zhang
چکیده

Predicting protein-coding genes still remains a significant challenge. Although a variety of computational programs that use commonly machine learning methods have emerged, the accuracy of predictions remains a low level when implementing in large genomic sequences. Moreover, computational gene finding in newly sequenced genomes is especially a difficult task due to the absence of a training set of abundant validated genes. Here we present a new gene-finding program, SCGPred, to improve the accuracy of prediction by combining multiple sources of evidence. SCGPred can perform both supervised method in previously well-studied genomes and unsupervised one in novel genomes. By testing with datasets composed of large DNA sequences from human and a novel genome of Ustilago maydi, SCG-Pred gains a significant improvement in comparison to the popular ab initio gene predictors. We also demonstrate that SCGPred can significantly improve prediction in novel genomes by combining several foreign gene finders with similarity alignments, which is superior to other unsupervised methods. Therefore, SCG-Pred can serve as an alternative gene-finding tool for newly sequenced eukaryotic genomes. The program is freely available at http://bio.scu.edu.cn/SCGPred/.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Link Prediction using Network Embedding based on Global Similarity

Background: The link prediction issue is one of the most widely used problems in complex network analysis. Link prediction requires knowing the background of previous link connections and combining them with available information. The link prediction local approaches with node structure objectives are fast in case of speed but are not accurate enough. On the other hand, the global link predicti...

متن کامل

Novel consensus quantitative structure-retention relationship method in prediction of pesticides retention time in nano-LC

In this study, quantitative structure-retention relationship (QSRR) methodology employed for modeling of the retention times of 16 banned pesticides in nano-liquid chromatography (nano-LC) column. Genetic algorithm-multiple linear regression (GA-MLR) method employed for developing global and consensus QSRR models. The best global GA-MLR model was established by adjusting GA parameters. Three de...

متن کامل

Computational gene prediction using multiple sources of evidence.

This article describes a computational method to construct gene models by using evidence generated from a diverse set of sources, including those typical of a genome annotation pipeline. The program, called Combiner, takes as input a genomic sequence and the locations of gene predictions from ab initio gene finders, protein sequence alignments, expressed sequence tag and cDNA alignments, splice...

متن کامل

پیشگویی گام‌ـ بلند سرعت باد مبتنی بر مدل ترکیبی RNNGA

For proper and efficient utilization of wind power, the prediction of wind speed is very important. Wind is one of the main sources of energy in the world, but the wind turbines have a lack of reliability, continuity and homogeneity in power production. On the other hand, sudden changes of wind speed, lead to risk for wind turbine units health. Therefore, the prediction of wind speed for turbin...

متن کامل

O-3: Drug Repositioning by Merging Gene Expression Data Analysis and Cheminformatics Target Prediction Approaches

The transcriptional responses of drug treatments combined with a protein target prediction algorithm was utilised to associate compounds to biological genomic space. This enabled us to predict efficacy of compounds in cMap and LINCS against 181 databases of diseases extracted from GEO. 18/30 of top drugs predicted for leukemia (e.g. Leflunomide and Etoposide) and breast cancer (e.g. Tamoxifen a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2008